Home |
| Latest | About | Random
# SVD and cat pictures. I like to showcase what linear algebra could do. In the following, we will see how we can **compress** an image file using something called **singular value decomposition** of a matrix. Let us say we have a image file, say **cat.jpg**: ![[1 teaching/smc-2025-spring/notes/---files/0b-svd-and-cat-pictures 2025-02-19 16.36.28.excalidraw.svg]] %%[[1 teaching/smc-2025-spring/notes/---files/0b-svd-and-cat-pictures 2025-02-19 16.36.28.excalidraw.md|🖋 Edit in Excalidraw]]%% Now, a computer image really is just a rectangular grid of numbers. Well, three grids of numbers, one for each color (blue, green, and red). Each grid records a number that represent how much of that certain color it is there at that location. Each tiny location grid is called a **pixel**. ![[1 teaching/smc-2025-spring/notes/---files/0b-svd-and-cat-pictures 2025-02-19 16.44.35.excalidraw.svg]] %%[[1 teaching/smc-2025-spring/notes/---files/0b-svd-and-cat-pictures 2025-02-19 16.44.35.excalidraw.md|🖋 Edit in Excalidraw]]%% So we see a picture is just made of rectangular table of numbers. Great. To simplify our example, we will only consider a **grayscale** image, meaning we just have one channel recording the lightness or darkness of an image. So now let us consider a (grayscale) image of size $m \times n$, that is, it is an image with $m$ rows and $n$ columns. Then this is just a **matrix** of shape $m \times n$, something that looks like this $$A = {\text{\small $m$ rows}}\left\{ \begin{matrix} \, \\ \, \\ \, \\ \, \end{matrix}\right.\underbrace{\begin{bmatrix}255 & 210 & 121 & \cdots \\ 212 & 32 & 142 & \cdots \\ \vdots & \ddots \\ 31 & 132 & 54 & \cdots\end{bmatrix}}_{n \text{ columns}}$$ Let us call this matrix $A$. Since this matrix $A$ has shape $m \times n$, we can think of it as holding $mn$ many numbers, this is in some sense its "file size". That is, this $m\times n$ matrix $A$ holds $mn$ many pieces of information. Now if we want to **compress** this image $A$, one way is to **throw away information**. But how do we do that? Certainly we can just crop (delete) half of the image, then that will cut down the information by $50\%$. But that's a bit drastic, can do compress the image by $50\%$ in another way? As it turns on, there is a way to **factor** or **decompose** a matrix $A$ called **singular value decomposition**. What it does is factorize the matrix $A$ into a product of three matrices, $U,S$, and $V$, like this $$ A = USV. $$ We will discuss what matrix multiplication is. But here is a description of what each matrix is, pay attention to their sizes for now: - $U$ is an $m\times m$ matrix, in particular its columns are *orthonormal*, whatever that means. - $S$ is an $m \times n$ matrix, **where all the entries are always zero, except possibly along its top-left to bottom-right diagonal**. In fact, $S$ looks something like this $$ S = \begin{bmatrix}\sigma_{1} & 0 & 0 & \cdots \\ 0 & \sigma_{2} & 0 & \cdots\\0 & 0 & \sigma_{3} & \cdots\\\vdots & & & \ddots\end{bmatrix} $$these numbers $\sigma_{1}$, $\sigma_{2}$, $\sigma_{3}$,... are called the **singular values** of this matrix $A$. And typically we arrange them from largest to smallest, like so: $\sigma_{1} \ge \sigma_{2} \ge \sigma_{3}\ge\cdots$. - $V$ is an $n\times n$ matrix, in particular its rows are orthonormal, whatever that means. As it turns out, the singular values indicates which column-row combination form the $U$, $V$ matrices are most "significant". That is, the larger the $\sigma_{i}$ value is, the more significant of the $i$-th column of $U$ and $i$-th row of $V$ are in describing the original matrix $A$. This gives us a clue of how to compress our image $A$! If the lower singular values have less contribution in describing the matrix $A$, then we can set them to zero, as a way of **approximation**. Effectively, this **deletes** the columns of $U$ and rows of $V$ that contribute less in describing what $A$ is. So let us say we keep just the first $k$ largest singular value in $S$. This corresponds to keeping just the first $k$ columns of $U$, and first $k$ rows of $V$, giving us this following approximation $$ A = USV \approx \tilde U \tilde S \tilde V $$ where - $\tilde U$ is an $m \times k$ matrix, by keeping just the first $k$ columns of $U$. - $\tilde S$ is a $k \times k$ matrix, by keeping just the top-left $k\times k$ corner of $S$. - $\tilde V$ is a $k \times n$ matrix, by keeping just the first $k$ rows of $V$. The resulting matrix product $\tilde U \tilde S \tilde V$ is still an $m\times n$ matrix, but with a different amount of information kept. - Since $\tilde U$ has shape $m\times k$, there are $mk$ many numbers here. - Since $\tilde S$ is the top-left $k\times k$ of $S$, then there is only $k$ numbers here (remember the other numbers in $S$ are zero!) - Since $\tilde V$ has shape $k \times n$, there are $kn$ many numbers here So in total, the approximation $\tilde U \tilde S \tilde V$ we keep just $$ mk + k + kn = (m+n+1)k $$many numbers. You can think of this as the new "file size". That is, we have compressed the original matrix $A$ to a **fraction** of $$ \text{fraction}=\frac{(m+n+1)k}{mn}. $$ In the trade, one often speak of **compression ratio**, which is given by $$ \text{compression ratio}= \frac{\text{uncompressed size}}{\text{compressed size}} $$which would be $$ \text{compression ratio} = \frac{mn}{(m+n+1)k} $$in this case. For example, if an image is of size $100\times 200$, and we kept $k=50$ of its largest singular values, then we have a fraction of $$ \frac{(100+200+1)\cdot50}{100\cdot 200}\approx 0.75 $$That is, the compressed image is about $75\%$ of the original size! And if we just keep $k=10$ of its largest singular values, then the fraction is now $$ \frac{(100+200+1)\cdot10}{100\cdot 200}\approx 0.15, $$wow! $15\%$ of the original size! What savings! Let us try this on our cat.jpg image. The original cat image I used here has size $1500\times 1435$. And using a **python code** (given below) along with numpy numeric linear algebra library to compute SVD automatically for us, we see the following results by keeping $k=50$ and $k=10$ many singular values: ![[1 teaching/smc-2025-spring/notes/---files/Pasted image 20250219202807.png]] In the $k=50$ case, the fraction is about $0.068$ and in the $k=10$ case the fraction is about $0.013$, which are **very significant savings!** Isn't this neat! The idea is, the larger the singular value, the more significant the corresponding columns and rows from $U$ and $V$ are, and they represent the majority feature of the original matrix $A$. So by keeping just the larger ones give us a good approximation of the original matrix. Now, this type of compression is a **lossy compression**, meaning that, yes, information is actually lost by discarding the smaller singular values, and we cannot reconstruct it back from the compressed file. But in applications where approximations are ok, like visuals, images, audio, videos, etc, then this is OK within reason. If however your data is where every piece of information is essential, then this lossy compression would not be suitable (you would need a **lossless compression**, like Huffman encoding.). The python code we used is here (you'll need to supply your own cat.jpg image file and put it in the same folder): ``` import numpy as np from PIL import Image import matplotlib.pyplot as plt # Load the image and convert it to grayscale image = Image.open("cat.jpg").convert("L") image_array = np.array(image) # Perform SVD U, S, V = np.linalg.svd(image_array) # Display the original and approximate images (using a subset of singular values) k = 50 # Number of largest singular values to keep approximate_image = U[:, :k] @ np.diag(S[:k]) @ V[:k, :] plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.imshow(image_array, cmap="gray") plt.title("Original Image") plt.subplot(1, 2, 2) plt.imshow(approximate_image, cmap="gray") plt.title(f"Approximate Image (k={k})") plt.show() ```